In this note we exemplify the application of metacell package to the lung development dataset, obtained using the MARS-seq protocol (Cohen et al. Cell 2018).

For further questions please contact Amir Giladi or Dikla Glebard Solodkin

Basic Metacell pipeline

Load the library, initialize a database and figures directory, define an id for the MetaCell objects

library("metacell")
if(!dir.exists("saved_work")) dir.create("saved_work/")
scdb_init("saved_work/", force_reinit=T)
#> initializing scdb to saved_work/
tgconfig::override_params("annotations/lung_params.yaml","metacell")
src_functions = "scripts/mc_modeling_functions.R"
if(!dir.exists("results")) dir.create("results/")
scfigs_init("results/")
if(!dir.exists("results/figure1")) dir.create("results/figure1")
index_fn = "annotations/lung_fn.txt"
id = "lung_kinetics"
ord_id = "lung_kinetics_sorted"

force_reinit=T instruct the system to override existing database objects.
This can be important if you are running in cycles and would like to update your objects. Otherwise, the database reuses loaded objects to save time on reading and initializing them from the disk.
tgconfig::override_params() overrides default metacell configurations.

We will read multiple MARS umi matrices (umi.tab) and merge them, based on a table defining the datasets
@param mat_nm defines the ID of the matrix object (and is going to be the name of all the objects from now on)
@param base_dir defines the umitab directory
@param mat_nm defines the name (id) of matrix
@param datasets_table_fn defines the index file of the MARS multi batch dataset. This is a tab delimited text file, with an arbitrary number of columns and a header line.
The three mandatory fields are:
Amp.Batch.ID - specify the ID of the batch defined by the row, and also the file name (without the .txt suffix) of the respective umi table in the base_dir provided.
Seq.Batch.ID - efines and ID of the sequencing batch (may be relevant for further noise cleanups beyond those done in the low-level pipeline).
Batch.Set.ID - The third id group different batches into sets for downstream analysis (e.g. QC and more).

Let us take a look at our index file:

Amp.Batch.ID Seq.Batch.ID Batch.Set.ID Description sorting.scheme developmental.time kinetic.group Day.of.birth C.section treatment coculture tissue Replicate injection.date reads reads_with_known_cell_barcode reads_bad_quality_well_barcode reads.spike_mapped reads.gene_mapped spike_yield gene_umis singleton_gene_umis spike_umis gene_umis_neg_control noise_estimation avg_noffsets_per_rmt avg_reads_per_rmt avg_noffsets_per_rmt_neg_ctrl avg_reads_per_rmt_neg_ctrl spike_count
AB1183 SB1 Lung_E16.5_CD45- E16.5(1)_CD45-_ind3 CD45- E16.5 E16.5 NA NA NA NA lung 1 NA 71234 68037 0 717 7517 2.4 1985 1091 189 144 0.073 2.80 3.79 2.54 4.07 0
AB1185 SB1 Lung_d7_CD45+ d7(1)_CD45+_ind1 CD45+ PN_7d P_d7 19.5 NA NA NA lung 1 NA 74433 71149 0 1241 8917 2.5 1614 820 195 98 0.061 3.85 5.52 2.59 4.41 0
AB1186 SB1 Lung_d7_CD45+ d7(1)_CD45+_ind2 CD45+ PN_7d P_d7 19.5 NA NA NA lung 1 NA 71097 67816 0 1600 10533 2.8 1567 814 215 151 0.096 4.24 6.71 3.04 5.99 0
AB1245 SB1 Lung_NB_CD45+ NB(1)_CD45+_ind1 CD45+ PN_10h P_early 19.5 NA NA NA lung 1 NA 79267 75916 0 1524 15302 2.1 1630 680 163 98 0.060 5.38 9.38 3.92 8.23 0
AB1246 SB1 Lung_NB_CD45+ NB(1)_CD45+_ind2 CD45+ PN_10h P_early 19.5 NA NA NA lung 1 NA 90316 86600 0 1586 17509 2.1 1865 806 163 87 0.047 5.40 9.39 3.83 7.67 0
AB1247 SB1 Lung_NB_CD45- NB(1)_CD45-_ind1 CD45- PN_10h P_early 19.5 NA NA NA lung 1 NA 86303 82773 0 2099 13442 2.6 1054 510 201 80 0.076 5.62 12.60 4.19 11.67 0

Let’s load a matrix to the system:

umi.tab_dir = "output/umi.tab/"
mcell_import_multi_mars(mat_nm = "all", dataset_table_fn = index_fn, base_dir = umi.tab_dir, force = T)
#> will read AB1183
#> will read AB1185
#> will read AB1186
#> will read AB1245
#> will read AB1246
#> will read AB1247
#> will read AB1248
#> will read AB1249
#> will read AB1250
#> will read AB1251
#> will read AB1409
#> will read AB1410
#> will read AB1411
#> will read AB1412
#> will read AB1433
#> will read AB1434
#> will read AB1435
#> will read AB1436
#> will read AB1439
#> will read AB1484
#> will read AB1485
#> will read AB1486
#> will read AB1487
#> will read AB1488
#> will read AB1489
#> will read AB1490
#> will read AB1491
#> will read AB1534
#> will read AB1535
#> will read AB1536
#> will read AB1537
#> will read AB1538
#> will read AB1829
#> will read AB1830
#> will read AB1852
#> will read AB1853
#> will read AB1854
#> will read AB1855
#> will read AB1856
#> will read AB1857
#> will read AB1858
#> will read AB1859
#> will read AB1906
#> will read AB1907
#> will read AB1908
#> will read AB1997
#> will read AB1998
#> will read AB1999
#> will read AB2000
#> will read AB2001
#> will read AB2002
#> will read AB2003
#> will read AB2004
#> will read AB2005
#> will read AB2006
#> will read AB2007
#> will read AB2008
#> will read AB2189
#> will read AB2190
#> will read AB2191
#> will read AB2192
#> will read AB2195
#> will read AB2196
#> will read AB2197
#> will read AB2198
#> will read AB2201
#> will read AB2202
#> will read AB2203
#> will read AB2204
#> will read AB2297
#> will read AB2299
#> will read AB5407
#> will read AB5408
#> [1] TRUE
mat = scdb_mat("all")
print(dim(mat@mat))
#> [1] 34016 28032

The scdb_mat() command returns a matrix object, which has one slot containing the count matrix (mat@mat), as well as additional features we will mention below.

MetaCell uses a standardized naming scheme for the figures, to make it easier to archive and link analysis figures to the database objects.
In principle, figures in the figures directory are named after the object data type they refer to (for example, mat for matrices, mc for metacells, and more, see below).
The figure name then includes also the object name they refer to, and a suffix describing the actual figure type.

Exploring and filtering the UMI matrix

To get a basic understanding of the new data, we will plot the distribution of UMI count per cell (the plot is thresholded after 500 umi counts):

mcell_plot_umis_per_cell("all",min_umis_cutoff = 500)
#> [1] 500
Umi distribution plot

Umi distribution plot

We want to clean some known issues from the matrix before starting to work with it.
We generate a list of mitochondrial genes that typically mark cells as being stressed or dying, as well as immunoglobulin genes that may represent strong clonal signatures in plasma cells, rather than cellular identity.

mat = scdb_mat("all")
ery_genes = c("Hba-a2", "Alas2", "Hba-a1", "Hbb-b2", "Hba-x", "Hbb-b1")
ery_umis = as.matrix(mat@mat[ery_genes,])
pool_ery = colSums(ery_umis)
doublets = read.delim("annotations/doublets.txt",header=FALSE,stringsAsFactor = FALSE)[[1]]

nms = c(rownames(mat@mat), rownames(mat@ignore_gmat))
ig_genes = c(grep("^Igj", nms, v=T), 
                grep("^Igh",nms,v=T),
                grep("^Igk", nms, v=T), 
                grep("^Igl", nms, v=T))
bad_genes = unique(c(grep("ERCC|^mt", nms, v=T), ig_genes))
print(bad_genes)
#>   [1] "ERCC-00002" "ERCC-00003" "ERCC-00004" "ERCC-00009" "ERCC-00012"
#>   [6] "ERCC-00013" "ERCC-00014" "ERCC-00016" "ERCC-00017" "ERCC-00019"
#>  [11] "ERCC-00022" "ERCC-00024" "ERCC-00025" "ERCC-00028" "ERCC-00031"
#>  [16] "ERCC-00033" "ERCC-00034" "ERCC-00035" "ERCC-00039" "ERCC-00040"
#>  [21] "ERCC-00041" "ERCC-00042" "ERCC-00043" "ERCC-00044" "ERCC-00046"
#>  [26] "ERCC-00048" "ERCC-00051" "ERCC-00053" "ERCC-00054" "ERCC-00057"
#>  [31] "ERCC-00058" "ERCC-00059" "ERCC-00060" "ERCC-00061" "ERCC-00062"
#>  [36] "ERCC-00067" "ERCC-00069" "ERCC-00071" "ERCC-00073" "ERCC-00074"
#>  [41] "ERCC-00075" "ERCC-00076" "ERCC-00077" "ERCC-00078" "ERCC-00079"
#>  [46] "ERCC-00081" "ERCC-00083" "ERCC-00084" "ERCC-00085" "ERCC-00086"
#>  [51] "ERCC-00092" "ERCC-00095" "ERCC-00096" "ERCC-00097" "ERCC-00098"
#>  [56] "ERCC-00099" "ERCC-00104" "ERCC-00108" "ERCC-00109" "ERCC-00111"
#>  [61] "ERCC-00112" "ERCC-00113" "ERCC-00116" "ERCC-00117" "ERCC-00120"
#>  [66] "ERCC-00123" "ERCC-00126" "ERCC-00130" "ERCC-00131" "ERCC-00134"
#>  [71] "ERCC-00136" "ERCC-00137" "ERCC-00138" "ERCC-00142" "ERCC-00143"
#>  [76] "ERCC-00144" "ERCC-00145" "ERCC-00147" "ERCC-00148" "ERCC-00150"
#>  [81] "ERCC-00154" "ERCC-00156" "ERCC-00157" "ERCC-00158" "ERCC-00160"
#>  [86] "ERCC-00162" "ERCC-00163" "ERCC-00164" "ERCC-00165" "ERCC-00168"
#>  [91] "ERCC-00170" "ERCC-00171" "mt-Atp6"    "mt-Atp8"    "mt-Co1"    
#>  [96] "mt-Co2"     "mt-Co3"     "mt-Cytb"    "mt-Nd1"     "mt-Nd2"    
#> [101] "mt-Nd3"     "mt-Nd4"     "mt-Nd4l"    "mt-Nd5"     "mt-Nd6"    
#> [106] "mt-Rnr1"    "mt-Rnr2"    "mt-Ta"      "mt-Tc"      "mt-Td"     
#> [111] "mt-Te"      "mt-Tf"      "mt-Tg"      "mt-Th"      "mt-Ti"     
#> [116] "mt-Tk"      "mt-Tl1"     "mt-Tl2"     "mt-Tm"      "mt-Tn"     
#> [121] "mt-Tp"      "mt-Tq"      "mt-Tr"      "mt-Ts1"     "mt-Ts2"    
#> [126] "mt-Tt"      "mt-Tv"      "mt-Tw"      "mt-Ty"      "Igj"       
#> [131] "Ighg"       "Ighg1"      "Ighg2c"     "Ighm"       "Ighmbp2"   
#> [136] "Ighv1-12"   "Ighv1-18"   "Ighv1-19"   "Ighv1-20"   "Ighv1-23"  
#> [141] "Ighv1-24"   "Ighv1-42"   "Ighv1-43"   "Ighv1-5"    "Ighv1-54"  
#> [146] "Ighv1-59"   "Ighv1-62-2" "Ighv1-63"   "Ighv1-67"   "Ighv1-7"   
#> [151] "Ighv1-71"   "Ighv1-73"   "Ighv1-83"   "Ighv1-84"   "Ighv1-85"  
#> [156] "Ighv1-9"    "Ighv10-3"   "Ighv11-2"   "Ighv12-3"   "Ighv14-3"  
#> [161] "Ighv15-2"   "Ighv2-4"    "Ighv3-3"    "Ighv3-8"    "Ighv8-11"  
#> [166] "Ighv8-12"   "Ighv8-14"   "Ighv8-5"    "Ighv8-6"    "Ighv8-9"   
#> [171] "Ighv9-2"    "Ighv9-3"    "Igkc"       "Igkj1"      "Igkj2"     
#> [176] "Igkj3"      "Igkj4"      "Igkj5"      "Igkv1-110"  "Igkv1-115" 
#> [181] "Igkv1-117"  "Igkv1-122"  "Igkv1-131"  "Igkv1-132"  "Igkv1-133" 
#> [186] "Igkv1-135"  "Igkv1-35"   "Igkv1-88"   "Igkv1-99"   "Igkv10-94" 
#> [191] "Igkv10-95"  "Igkv10-96"  "Igkv11-125" "Igkv12-38"  "Igkv12-41" 
#> [196] "Igkv12-44"  "Igkv12-46"  "Igkv12-47"  "Igkv12-89"  "Igkv12-98" 
#> [201] "Igkv13-84"  "Igkv14-100" "Igkv14-111" "Igkv14-126" "Igkv14-130"
#> [206] "Igkv15-103" "Igkv16-104" "Igkv17-121" "Igkv17-127" "Igkv17-134"
#> [211] "Igkv18-36"  "Igkv19-93"  "Igkv2-109"  "Igkv2-112"  "Igkv2-116" 
#> [216] "Igkv2-137"  "Igkv3-1"    "Igkv3-10"   "Igkv3-12"   "Igkv3-2"   
#> [221] "Igkv3-3"    "Igkv3-4"    "Igkv3-7"    "Igkv3-9"    "Igkv4-50"  
#> [226] "Igkv4-53"   "Igkv4-55"   "Igkv4-56"   "Igkv4-57"   "Igkv4-57-1"
#> [231] "Igkv4-58"   "Igkv4-59"   "Igkv4-61"   "Igkv4-62"   "Igkv4-63"  
#> [236] "Igkv4-68"   "Igkv4-69"   "Igkv4-70"   "Igkv4-71"   "Igkv4-72"  
#> [241] "Igkv4-73"   "Igkv4-74"   "Igkv4-78"   "Igkv4-79"   "Igkv4-80"  
#> [246] "Igkv4-81"   "Igkv4-86"   "Igkv4-90"   "Igkv4-91"   "Igkv4-92"  
#> [251] "Igkv5-37"   "Igkv5-39"   "Igkv5-43"   "Igkv5-45"   "Igkv5-48"  
#> [256] "Igkv6-13"   "Igkv6-14"   "Igkv6-15"   "Igkv6-17"   "Igkv6-20"  
#> [261] "Igkv6-23"   "Igkv6-25"   "Igkv6-29"   "Igkv6-32"   "Igkv7-33"  
#> [266] "Igkv8-16"   "Igkv8-18"   "Igkv8-19"   "Igkv8-21"   "Igkv8-24"  
#> [271] "Igkv8-26"   "Igkv8-27"   "Igkv8-28"   "Igkv8-30"   "Igkv9-120" 
#> [276] "Igkv9-123"  "Igkv9-124"  "Igkv9-129"  "Iglc2"      "Iglc3"     
#> [281] "Igll1"      "Iglon5"     "Iglv1"      "Iglv2"      "Iglv3"

We will next ask the package to ignore the above genes and ignore erithrocytes:

mcell_mat_ignore_genes(new_mat_id=id, mat_id="all", bad_genes, reverse=F) 
mcell_mat_ignore_cells(new_mat_id=id,mat_id=id,ig_cells=c(names(which(pool_ery>64)),doublets),reverse=FALSE)

Ignored genes are kept in the matrix for reference, but all downstream analysis will disregard them.
This means that the number of UMIs from these genes cannot be used to distinguish between cells.

In the current example we will also eliminate cells with less than 500 UMIs (threshold can be set based on examination of the UMI count distribution):

mcell_mat_ignore_small_cells(id, id, 500)

Note that filtering decisions can be iteratively modified given results of the downstream analysis.

Selecting feature genes

We move on to computing statistics on the distributions of each gene in the data, which are going to be our main tool for selecting feature genes for MetaCell analysis:

set.seed(27)
mcell_add_gene_stat(gstat_id=id, mat_id=id, force=T)
#> Calculating gene statistics...
#> will downsamp
#> done downsamp
#> will gen mat_n
#> done gen mat_n
#> done computing basic gstat, will compute trends
#> ..done

This generates a new object of type gstat under the name gstatlung_kinetics, by analyzing the count matrix matlung_kinetics. We can explore interesting genes and their distributions:

gstat = scdb_gstat(id)
print(head(gstat))
#>                        name  tot       var is_on_count sz_cor sz_cor_norm
#> 0610005C13Rik 0610005C13Rik   25 0.0017513           7  0.001      -0.018
#> 0610007C21Rik 0610007C21Rik 4934 0.3718067        1067  0.217      -0.076
#> 0610007L01Rik 0610007L01Rik 4518 0.3672712         887  0.322       0.045
#> 0610007P08Rik 0610007P08Rik 2029 0.1806722         378  0.237       0.032
#> 0610007P14Rik 0610007P14Rik 3084 0.2404298         613  0.336       0.092
#> 0610007P22Rik 0610007P22Rik 1260 0.0923163         269  0.186       0.017
#>               niche_stat niche_norm    n_mean ds_top1 ds_top2 ds_top3
#> 0610005C13Rik          1          0 0.0005314       2       1       1
#> 0610007C21Rik          1          0 0.0830750       4       4       4
#> 0610007L01Rik          1          0 0.0672444       6       4       4
#> 0610007P08Rik          1          0 0.0274103       4       4       3
#> 0610007P14Rik          1          0 0.0411027       4       4       3
#> 0610007P22Rik          1          0 0.0184338       3       3       3
#>                 ds_mean    ds_var ds_log_varmean   ds_vm_norm
#> 0610005C13Rik 0.0004099 0.0005123      0.1578571  0.158020065
#> 0610007C21Rik 0.0628235 0.0775330      0.3014478 -0.002186932
#> 0610007L01Rik 0.0504228 0.0602841      0.2554940 -0.020394342
#> 0610007P08Rik 0.0216756 0.0267414      0.2971221  0.030240521
#> 0610007P14Rik 0.0343838 0.0400701      0.2180006 -0.050494882
#> 0610007P22Rik 0.0150653 0.0178114      0.2347471 -0.027689970
#>               ds_is_on_count downsample_n
#> 0610005C13Rik              7          750
#> 0610007C21Rik           1067          750
#> 0610007L01Rik            887          750
#> 0610007P08Rik            378          750
#> 0610007P14Rik            613          750
#> 0610007P22Rik            269          750
print(quantile(gstat$ds_vm_norm,c(1:20)/20))
#>            5%           10%           15%           20%           25% 
#> -0.1590463979 -0.1189811308 -0.0956332690 -0.0739704359 -0.0558974877 
#>           30%           35%           40%           45%           50% 
#> -0.0359508581 -0.0173696211 -0.0004101334  0.0000000000  0.0000000000 
#>           55%           60%           65%           70%           75% 
#>  0.0000000000  0.0152220223  0.0380227149  0.0620211302  0.0895131579 
#>           80%           85%           90%           95%          100% 
#>  0.1264445526  0.1679363186  0.2314046866  0.3923586117  5.5904714813
png("results/hist.gstats.t_vm.png")
hist(gstat$ds_vm_norm,breaks=c(floor(min(gstat$ds_vm_norm)*10):ceiling(max(gstat$ds_vm_norm)*10))/10,main = "Histogram of downsampled variance divided by mean")
abline(v=0.3)
axis(side = 1,at=0.3,labels="0.3")
dev.off()
#> png 
#>   2
t_vm = 0.3
Histogram of ds_vm_norm

Histogram of ds_vm_norm

Selecting a gene set for downstream analysis:
We create a new object of type gset (gene set), to which all genes whose scaled variance (variance divided by mean, AKA ds_vm_norm) exceeds a given threshold are added.
The command creates a new gene set with all genes for which the scaled variance is higher than 0.3, it also restricts this gene set to genes with at least 50 UMIs across the entire dataset, and also requires selected genes to have at least three cells for more than 4 UMIs were recorded.

mcell_gset_filter_multi(gstat_id=id, gset_id=id, T_tot=50, T_top3=4, T_vm = t_vm, force_new = T)
#> Selected 648 markers

Selected 648 markers

We update the gset object by removing a list of irrelevant genes which we don’t want them to affect the clustering process:

modules = read.csv("annotations/modules_big.csv", stringsAsFactors = F)
cc_genes = modules[ modules$annotation == "CC", "gene"]
ribo_genes = modules[ modules$annotation == "Ribo", "gene"]
other_genes = c("Ccnd2","Cdkn1c","H19","Hmga2","Igf2","Igfbp5","Mdk")
bad_marks = unique(c(bad_genes,cc_genes, ribo_genes,other_genes, "Malat1", "7SK","Xist", "mmu-mir-689-2", "Atp5g3", "Csta"))
save(file="saved_work/bad_marks.Rda",bad_marks)
gset_unfiltered = scdb_gset(id)
markers_to_keep = setdiff(names(gset_unfiltered@gene_set), bad_marks)
scdb_del_gset(id)
#> [1] TRUE
tmp = rep(1, length(markers_to_keep)); names(tmp) = markers_to_keep
new_gset = gset_new_gset(tmp, "ribos filtered out")
scdb_add_gset(id, new_gset)

Selected after filtration 627 markers

We can refine our parameters by plotting all genes and our selected gene set given the mean and variance statistics:

mcell_plot_gstats(gstat_id=id, gset_id=id)
#> png 
#>   2
var mean plot

var mean plot

Building the balanced cell graph

Assuming we are happy with the selected genes, we will move forward to create a similarity graph (cgraph), using a construction called balanced K-nn graph:

set.seed(27)
mcell_add_cgraph_from_mat_bknn(mat_id=id,gset_id = id,graph_id=id,K=100,dsamp=T)
#> will downsample the matrix, N= 750
#> will build balanced knn graph on 21111 cells and 627 genes, this can be a bit heavy for >20,000 cells
#> 19%...35%...51%...68%...86%...100%
#> sim graph is missing 11 nodes, out of 21111

This adds to the database a new cgraph object named cgraph.lung_kinetics.
The K=100 parameter is important, as it affects the size distribution of the derived metacells.

The knn procedure creates a graph with all the cells as nodes and weighted edges as a representation of the similiarty strength between pair of cells:

cgraph = scdb_cgraph(id)
kable(head(cgraph@edges))
mc1 mc2 w
W239068 W296614 1.00
W239068 W323024 0.99
W239068 W323459 0.98
W239068 W322850 0.97
W239068 W339307 0.96
W239068 W445358 0.95

Resampling and generating the co-clustering graph

The next step will use the cgraph to sample five hundred metacell partitions, each covering 75% of the cells and organizing them in dense subgraphs:

set.seed(27)
mcell_coclust_from_graph_resamp(coc_id=id,graph_id=id,min_mc_size=15,p_resamp=0.75, n_resamp=500)
#> running bootstrap to generate cocluster
#> 0%...1%...2%...6%...15%...18%...21%...27%...31%...35%...38%...42%...45%...49%...52%...57%...60%...64%...68%...72%...75%...79%...83%...86%...90%...93%...97%...100%
#> done resampling

The metacell size distribution of the resampled partitions will be largely determined by the K parameter used for computing the cgraph.
The resampling process may take a while if the graphs are very large. You can modify n_resamp to generate fewer resamples.

The resampling procedure creates a new coclust object in the database named coclust.lung_kinetics, and stores the number of times each pair of cells ended up being part of the same metacell (the cnt column).

coclust = scdb_coclust(id)
kable(head(coclust@coclust))
node1 node2 cnt
W239068 W239068 387
W239068 W239075 1
W239068 W239173 1
W239068 W239254 27
W239068 W239299 93
W239068 W239361 23

The co-clustering statistics are used to generate a new similarity graph, based on which accurate calling of the final set of metacells is done:

set.seed(27)
mcell_mc_from_coclust_balanced(coc_id=id,mat_id= id,mc_id= id,K=30, min_mc_size=15, alpha=2)
#> filtered 9696757 left with 1470381 based on co-cluster imbalance
#> building metacell object, #mc 269
#> add batch counts
#> compute footprints
#> 50%...92%...100%
#> compute absolute ps
#> 47%...88%...100%
#> compute coverage ps
#> 54%...99%...100%
#> reordering metacells by hclust and most variable two markers
#> reorder on Tyrobp vs Fstl1

We created a metacell object mc.lung_kinetics based on analysis of the co-clustering graph.
The parameter K determines the number of neighbors we wish to minimally associate with each cell.
Prior to partitioning the co-cluster graph is filtered to eliminate highly unbalanced edges, with smaller alpha resulting in harsher filtering.

Creating heatmaps of metacells and genes

We will first assign random colors to our clusters (these can later be modified with custom color definitions, e.g. based on cell type assignments).

mc<- scdb_mc(id)
mc@colors <- colorRampPalette(c("darkgray", "burlywood1", "chocolate4","orange", "red", "purple", "blue","darkgoldenrod3", "cyan"))(ncol(mc@mc_fp))
scdb_add_mc(id,mc)
mc<- scdb_mc(id)

The metacell object mc.lung_kinetics can now be visualized.
In order to do this effectively, we usually go through one or two iterations of selecting informative marker genes.
The package can select markers for you automatically - by simply looking for genes that are strongly enriched in any of the metacells:

mcell_gset_from_mc_markers(gset_id=paste0(id,"_markers"), mc_id=id)
mcell_mc_plot_marks(mc_id=id, gset_id=paste0(id,"_markers"), mat_id=id,plot_cells = T)
heatmap_marks_mc

heatmap_marks_mc

Selecting markers and coloring metacells

We can take a look on the distribution of gene markers (requires a prior literature review), and generate a colorize table.
Assume we have a marker genes table like that:

name gene T_fold
MacI Cx3cr1 2.7
MacII Ear2 4.2
MacIII Ccl6 16.6
Mon Ccr2 2.0
Mon F13a1 7.0
Mon Fcgr4 3.5

The values plotted are color coded log2(fold enrichment) and fold enrichment values of the metacell over the median of all other metacells:

mc = scdb_mc(id)
nx = ceiling(dim(unique(markers_table[,c("name","gene")]))[1]/3)
ny = 3
lfp <- log2(mc@mc_fp)
png("results/genes_log_distribution.png",w=5500,h=(11000/nx) * ny)
layout(matrix(1:(nx*ny), nx , ny , byrow=T))
for(cell_type in unique(markers_table$name)){
  for(gene in markers_table[markers_table$name == cell_type,"gene"]){
    par(mar=c(5,4,5,4),xpd=TRUE)
    cap = paste(cell_type,gene,sep="_")
    barplot(lfp[gene,],col=mc@colors,las=2,cex.axis=2,ylab="log2FC",xlab="metacells")
    abline(h=log2(markers_table[markers_table$name == cell_type & markers_table$gene == gene,"T_fold"]))
    title(main = cap,cex.main=3.6)
  }
}
dev.off()
#> png 
#>   2
Genes log distribution

Genes log distribution

#> png 
#>   2
Genes distribution

Genes distribution

Using the correct genes and T_fold threshold (the horizontal line in each barplot), as well as priority, is necessary in order to colorize the metacells.
There are other methods we can use in order to find the best marker genes (and thresholds).
For example,XY plot of the footprint score (mc@mc_fp) of each mc in 2 different genes:

plot_two_genes_fp = function(mc_id, ga, gb, log = T) {
  mc = scdb_mc(mc_id)
  fp = mc@mc_fp
  if (log) {
    fp = log2(fp)
  }
  a = fp[ga,]; b = fp[gb,]
    plot(a,b, xlab = ga, ylab = gb, pch = 21, cex = 2.5, bg = mc@colors)
    text(a,b, names(a),cex = 0.8)
    return(data.frame(a = a, b = b)) 
}

pairs = list(c("Acta2","Tgfbi"),c("Flt3","Cst3"),c("Flt3","H2-Aa"),c("Ccl5","Gzma"),c("Ccl5","Trbc2"),c("Gzma","Trbc2"),c("Flt3","Ccr2"),c("Cst3","Ccr2"),c("Flt3","Cx3cr1"),c("Cst3","Cx3cr1"),c("Rora","Il7r"))
if(!dir.exists("results/genes_comp_fc")) dir.create("results/genes_comp_fc/")
for(pair in pairs){
  ga = pair[1]; gb = pair[2]
  png(paste0("results/genes_comp_fc/",ga,"_",gb,".png"))
  ga_gb_df = plot_two_genes_fp(id,ga,gb,log=F)
  dev.off()
}

In this example we can decide that Tgfbi is a sufficent discrimnator for smooth muscle fibroblast cells
Example gene1 vs gene2

Finally we can genearte a colorizing table based on marker genes expression.

Let us take a look at our colorizing table:
@column1 group the name of the cell type @column2 gene the name of the marker gene for this cell type (each cell type can have multiple corresponding genes, but each gene can have only one corresponding cell type) @column3 color the corresponding color for this specific cell type @column4 priority for cases of passing the threshold in two different cell types:
If a metacell x has high footprint score in two different cell types markers (g1 and g2) (mc@mc_fp[g1,x] > T_fold1 AND mc@mc_fp[g2,x] > T_fold2), the gene (and its corresponding cell type) with the higher score weighted by priority is chosen (max(priority1 \(\cdot\) log2(mc@mc_fp[g1,x]),priority2 \(\cdot\) log2(mc@mc_fp[g2,x])))

marks_colors = read.delim("annotations/mc_colorize.txt", sep="\t", stringsAsFactors=F)
kable(marks_colors)
group gene color priority T_fold
MacI Cx3cr1 #A9C4A1 3 2.7
MacII Ear2 #118437 2 4.2
MacIII Ccl6 #0E431F 3 30.0
Mon Ccr2 #59B56F 2 2.0
Mon F13a1 #59B56F 2 7.0
Mon Fcgr4 #59B56F 3 3.5
Neut Retnlg #6BAC07 1 17.0
Neut S100a8 #6BAC07 1 20.0
Baso Mcpt8 #D6D812 3 6.0
Mast Mcpt4 #C3B820 1 3.0
Mast Cpa3 #C3B820 2 20.0
DC Flt3 #2A8EAA 5 1.3
DC Cst3 #2A8EAA 2 9.0
B Cd79b #79A3FF 1 3.0
T Trbc2 #5E83BB 2 6.0
ILC Rora #1D50FF 4 3.5
NK Gzma #5131BB 1 30.0
Endothel Cdh5 #AB7737 3 3.0
Fibro Col1a2 #D8B08D 1 2.0
Smooth Tgfbi #EA5162 3 7.0
Matrix Mfap4 #ED6C30 3 8.0
Pericytes Gucy1a3 #C51711 4 20.0
Epithel Epcam #D3AFD0 1 2.5
AT1 Clic5 #B17BBC 2 8.0
AT2 Sftpc #BA5C7C 2 65.0
Club Scgb3a2 #9728AC 3 40.0
Ciliated Ccdc19 #551a8b 3 2.0

Applying this table to color metacells is done using the command mc_colorize as shown below.
@param new_mc_id output metacell id in scdb
@param mc_id input metacell id in scdb
@param marker_color a data frame with fields gene, group, color, priority, thresh
@param override if this is true, all colors are going to be set to white unless some marker match is found

mc_colorize(new_mc_id = id, mc_id = id, marker_colors=marks_colors,override=T)

We are now equipped with some basic coloring of metacells, which can also be accessed directly:

mc = scdb_mc(id)
table(mc@colors)
#> 
#> #0E431F #118437 #1D50FF #2A8EAA #5131BB #551a8b #59B56F #5E83BB #6BAC07 
#>       3      45       2       6       2       1      30       4      31 
#> #79A3FF #9728AC #A9C4A1 #AB7737 #B17BBC #BA5C7C #C3B820 #C51711 #D3AFD0 
#>       8       1       6      42      16      13       2       1      10 
#> #D6D812 #D8B08D #EA5162 #ED6C30 
#>       2      21       9      13

Visualizing the MC confusion matrix in order to colorize the metacells

While 2D projections are popular and intuitive (albeit sometimes misleading) ways to visualize scRNA-seq results, we can also summarize the similarity structure among metacells using a “confusion matrix” which encodes the pairwise similarities between all metacells.
This matrix may capture hierarchical structures or other complex organizations among metacells.

We first create a hierarchical clustering of metacells, based on the number of similarity relations between their cells:

set.seed(27)
mc_hc = mcell_mc_hclust_confu(mc_id=id,graph_id=id)

Next, we generate clusters of metacells based on this hierarchy, and visualize the confusion matrix and these clusters.
The confusion matrix is shown at the bottom, and the top panel encodes the cluster hierarchy (subtrees in blue, sibling subtrees in gray):

set.seed(27)
mc_sup = mcell_mc_hierarchy(mc_id=id,mc_hc=mc_hc, T_gap=0.04)
save(file="saved_work/mc_hc_sup.Rda",mc_hc,mc_sup)
mcell_mc_plot_hierarchy(mc_id=id,graph_id=id,mc_order=mc_hc$order,sup_mc = mc_sup,width=3500, height=3500, min_nmc=2)
#> png 
#>   2
confusion matrix

confusion matrix

After exploring the confusion matrix, we would like to remove mc 269 because it doesn’t have a significant signature of any relevant cell type. We would like also to define sup 128 and 142 as a super MCs of Monocytes, 148 as a super MC of Macrophages type III and 171 as a super MC of Macrophages type II It is a little bit challenging to find the “correct” marker genes, thresholds and priority in order to apply with the simple mc_colorize function.
So we can update it manually:

ord_id = paste0(id,"_sorted")
mc = scdb_mc(id)
scdb_add_mc(id = paste0(id,"_pre_f"),mc = mc)
good_cells = names(mc@mc[mc@mc!=269])
mc@mc = mc@mc[good_cells]
mc@cell_names = good_cells
mc@mc_fp = mc@mc_fp[,-269]
mc@e_gc = mc@e_gc[,-269]
mc@cov_gc = mc@cov_gc[,-269]
mc@n_bc = mc@n_bc[,-269]
mc@annots = mc@annots[which(names(mc@annots)!=269)]
mc@colors = mc@colors[-269]
mc@colors[mc_sup[[128]]$mcs] = "#59B56F"
mc@colors[mc_sup[[142]]$mcs] = "#59B56F"
mc@colors[mc_sup[[148]]$mcs] = "#0E431F"
mc@colors[mc_sup[[171]]$mcs] = "#118437"
mat = scdb_mat(id)
mcell_mat_ignore_cells(ord_id,id,ig_cells = good_cells,reverse = T)
scdb_add_mc(id = ord_id,mc = mc)
cgraph = scdb_cgraph(id)
cgraph@cell_names = cgraph@cell_names[cgraph@cell_names %in% good_cells]
cgraph@edges = cgraph@edges[cgraph@edges$mc1 %in% good_cells & cgraph@edges$mc2 %in% good_cells,]
cgraph@nodes = cgraph@nodes[cgraph@nodes %in% good_cells]
scdb_add_cgraph(ord_id,cgraph)
sc_coc = scdb_coclust(id)
sc_coc@coclust = sc_coc@coclust[sc_coc@coclust$node1 %in% good_cells & sc_coc@coclust$node2 %in% good_cells,]
sc_coc@n_samp = sc_coc@n_samp[names(sc_coc@n_samp) %in% good_cells]
scdb_add_coclust(ord_id,sc_coc)

Re-running the hierarchial clustering for further use

set.seed(27)
mc_hc = mcell_mc_hclust_confu(mc_id=ord_id,graph_id=ord_id)
set.seed(27)
mc_sup = mcell_mc_hierarchy(mc_id=ord_id,mc_hc=mc_hc, T_gap=0.04)
save(file="saved_work/mc_hc_sup_f.Rda",mc_hc,mc_sup)
mcell_mc_plot_hierarchy(mc_id=ord_id,graph_id=ord_id,mc_order=mc_hc$order,sup_mc = mc_sup,width=3500, height=3500, min_nmc=2)
#> png 
#>   2
confusion matrix

confusion matrix

Re ordering our mc object

We want to re order the metacells by the new hierarchial clusering order

mc = scdb_mc(ord_id)
load("saved_work/mc_hc_sup_f.Rda")
lin_ord = c("Endothel","Fibro","Matrix","Smooth","Pericytes","Epithel","AT1","AT2","Club","Ciliated","MacI","MacII","MacIII","Mon","Neut", "Baso", "Mast","DC","B", "T", "ILC","NK")
fac = mc@colors[mc_hc$order]
names(fac) = mc_hc$order
color_key = unique(mc@color_key[,c("group","color")])
name2color = color_key$color
names(name2color) = color_key$group
fac = sort(factor(fac,levels = name2color[lin_ord]))
mc_ord = as.integer(names(fac))
sorted_mc = mc_reorder(mc,mc_ord)
scdb_add_mc(id=ord_id,mc = sorted_mc)
mc = scdb_mc(ord_id)

A comprehensive map of the lung cell types during development

Projecting metacells and cells in 2D

We may want to visualize the similarity structure among metacells (or among cells within metacells).
We construct a 2D projection of the metacells, and use it to plot the metacells and key similarities between them (shown as connecting edges), as well as the cells. This plot will use the same metacell coloring we established before.

# source(src_functions)
scfigs_init("results/figure1")
tgconfig::override_params("annotations/lung_params.yaml","metacell")
set.seed(27)
mcell_mc2d_force_knn(mc2d_id= ord_id,mc_id=ord_id, graph_id=ord_id)
#> comp mc graph using the graph lung_kinetics_sorted and K 20
mc2d = scdb_mc2d(ord_id)
new_mc_y = mc2d@mc_y
new_mc_y[212:268] = new_mc_y[212:268] + 25
new_mc_y[30] = new_mc_y[29] + 60
new_mc_y[246] = new_mc_y[245]
new_mc_y[245] = new_mc_y[245] + 25
xy_df = data.frame(mc_x = mc2d@mc_x, mc_y = new_mc_y)
mcell_mc2d_force_knn_on_cells(mc2d_id = ord_id,mc_id = ord_id,graph_id = ord_id,mc_xy = xy_df)
#> comp mc graph using the graph lung_kinetics_sorted and K 20
tgconfig::set_param("mcell_mc2d_width",2200, "metacell")
tgconfig::set_param("mcell_mc2d_height",2000, "metacell")
# local_mcell_mc2d_plot(mc2d_id=ord_id,plot_edges = T)  
mcell_mc2d_plot(mc2d_id=ord_id,plot_edges = T)  
#> png 
#>   2
scfigs_init("results/")

Note that we changed the metacell parameters “mcell_mc2d_height/width” to get a reasonably-sized figure.

We created a two-dimensional projection of 21033 single cells from 17 mice from all time points were analyzed. 268 metacells were associated with 22 cell types and states, annotated, and marked by color code.

2d projection of single cells onto a graph representation

2d projection of single cells onto a graph representation

Creating a heatmap of genes and metacells

We can use the colors to produce a labeled heatmap, showing selected genes and their distributions over metacells, with the colored annotation shown at the bottom:

mcell_gset_from_mc_markers(gset_id=paste0(ord_id,"_markers"), mc_id=ord_id)
mcell_mc_plot_marks(mc_id=ord_id, gset_id=paste0(ord_id,"_markers"), mat_id=ord_id,plot_cells=TRUE)
heatmap_marks_after_colorizing

heatmap_marks_after_colorizing

We would like to create another heatmap:
We order the metacells by their kinetic group, then choose marker genes by a predefined list and the top 10 genes in each metacell

tgconfig::override_params("annotations/lung_params.yaml","metacell")
source(src_functions)
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
#> Loading required package: Biobase
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:parallel':
#> 
#>     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
#>     clusterExport, clusterMap, parApply, parCapply, parLapply,
#>     parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, append, as.data.frame, basename, cbind,
#>     colMeans, colnames, colSums, dirname, do.call, duplicated,
#>     eval, evalq, Filter, Find, get, grep, grepl, intersect,
#>     is.unsorted, lapply, lengths, Map, mapply, match, mget, order,
#>     paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind,
#>     Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort,
#>     table, tapply, union, unique, unsplit, which, which.max,
#>     which.min
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: 'pcaMethods'
#> The following object is masked from 'package:stats':
#> 
#>     loadings
#> Loading required package: graph
#> Loading required package: grid
#> 
#> Attaching package: 'plyr'
#> The following object is masked from 'package:graph':
#> 
#>     join
ord_id = paste0(id,"_sorted")
mc = scdb_mc(ord_id)
mat = scdb_mat(ord_id)
lfp = log2(mc@mc_fp)
cell_stats = mat@cell_metadata[names(mc@mc),]
lin_markers = c("Ccl5","Trbc2","Cd19","Retnlg","Mcpt8","Mcpt4","F13a1","Ear2","Epcam","Cdh5","Col1a2","Akap5","Lamp3","Scgb3a2","Foxj1","Enpp2","Mfap4","Gucy1a3")
lin_ord = c("Endothel","Fibro","Matrix","Smooth","Pericytes","Epithel","AT1","AT2","Club","Ciliated","MacI","MacII","MacIII","Mon","Neut", "Baso", "Mast","DC","B", "T", "ILC","NK")
tp_cols = c("bisque", "lightgoldenrod1", "gold", "chocolate2", "coral3", "brown3", "indianred4")
tps = factor(cell_stats$kinetic.group, levels = c("E12.5", "E16.5", "E_late", "P_early", "P_mid", "P_d2", "P_d7"))
names(tps) = rownames(cell_stats)
color_key = unique(mc@color_key[,c("group","color")])
color2name = color_key$group
names(color2name) = color_key$color

nms = choose_genes_from_mc(mc = mc, mat = mat, nms_per_mc = 10,nms_thresh = 2)
tp_dist = table(tps %in% c("E12.5", "E16.5", "E_late"), mc@mc)
dist_n = tp_dist / rowSums(tp_dist)
mc_names = color2name[mc@colors]
names(mc_names) = 1:length(mc@colors)

clust_ord = as.integer(unlist(tapply(1:length(mc@colors),factor(mc_names,levels = lin_ord) , function(x) if(length(x) == 1){return(x);} else{return(as.integer(names(sort((dist_n[1,x] > dist_n[2,x])/length(x)))))})))
mc_ord = factor(1:length(mc@colors),levels = clust_ord)

nms = choose_genes_from_mc(mc = mc, mat = mat, good_mcs = clust_ord,nms_per_mc = 5,nms_thresh = 3, max_num = 80, bad_genes = bad_marks, ord= "max.col", must_haves = union(lin_markers,mc@color_key$gene))
gene_set = rep(1,length(nms))
names(gene_set) = rev(nms)
man_gset = gset_new_gset(gene_set,"manual_gset_markers")
scdb_add_gset(id = paste0(ord_id,"_man_markers"),gset = man_gset)

if(!dir.exists("results/figureS1")) dir.create("results/figureS1/")
heatmap_tp(mc_id = ord_id,gset_id = paste0(ord_id,"_man_markers"),mat_id = ord_id,fig_fn ="results/figureS1/heatmap_by_tp.png",mc_ord = clust_ord,tps = tps,tp_cols = tp_cols)
#> png 
#>   2

We created a heatmap for gene expression of key markers across single cells from both immune and non-immune compartments.
Lower panels indicate association to cell type (color bars represent cell type), and developmental time-point of each single cell.

heatmap_marks_by_time_points

heatmap_marks_by_time_points

2D projection by Genes

Expression quantiles of key cell-type-specific marker genes on top of the 2D map of lung development.

tgconfig::override_params("annotations/lung_params.yaml","metacell")
lin_markers = c("Ccl5","Trbc2","Cd19","Retnlg","Mcpt8","Mcpt4","F13a1","Ear2","Epcam","Cdh5","Col1a2","Akap5","Lamp3","Scgb3a2","Foxj1","Enpp2","Mfap4","Gucy1a3")
b = 9
mat = scdb_mat(ord_id)
mc2d = scdb_mc2d(ord_id)
palette = c("white", "cornsilk1", "orange","red3", "purple4", "midnightblue")
ny = ceiling(length(lin_markers)/6)
nx = 6
png("results/figure1/genes_2dproj.png",w=4200,h=(4200/nx) * ny)
layout(matrix(1:(nx*ny), ny , nx , byrow=T))
for (val in lin_markers) {
  vals = as.matrix(mat@mat)[val,]
  norm_val = rep(1, length(vals))
  names(norm_val) = names(vals)
  norm_val[ vals != 0] = as.numeric(cut(vals[ vals != 0], unique(quantile(vals[ vals != 0], (0:b)/b)), include.lowest = T)) + 1
  cols = colorRampPalette(palette)(max(norm_val))
  par(mar=c(0.5,0.5,3,0.5),xpd=TRUE)
  plot(mc2d@sc_x, mc2d@sc_y, pch = 20, col = "gray80", cex=1, axes = F, xlab = "", ylab = "")
    exp_cells = names(which(vals > 0))
    points(mc2d@sc_x[exp_cells], mc2d@sc_y[exp_cells], cex = 1 + 0.4 * round((norm_val[exp_cells] - 1) / max(norm_val) * 5),pch = 21, bg = cols[norm_val[exp_cells]])
    title(main=val, cex.main=3.6)
}
dev.off()
#> png 
#>   2
Genes 2D distribution

Genes 2D distribution

Coculstering

Log values of the co-clustering structure of both compartments, as assessed by bootstrapping analysis. Color bars represent cell types.

tgconfig::override_params("annotations/lung_params.yaml","metacell")
mcp_heatmap_text_cex = tgconfig::get_param("mcp_heatmap_text_cex",package = "metacell")
mcp_2d_legend_cex = tgconfig::get_param("mcell_mc2d_legend_cex",package = "metacell")
sc_coc = scdb_coclust(ord_id)
mc = scdb_mc(ord_id)
X = sc_coc@coclust
Y = matrix(0,  length(mc@mc), length(mc@mc), dimnames = list(names(mc@mc), names(mc@mc)))
Y[cbind(as.vector(X$node1), as.vector(X$node2))] = X$cnt
lin_ord = c("Endothel","Fibro","Matrix","Smooth","Pericytes","Epithel","AT1","AT2","Club","Ciliated","MacIII","MacII","MacI","Mon","DC", "Neut", "Baso", "Mast", "B", "T", "NK", "ILC")
color_key = unique(mc@color_key[,c("group","color")])
name2color = color_key$color
names(name2color) = color_key$group
color2name = color_key$group
names(color2name) = color_key$color

lung_names = as.character(color2name[mc@colors])
names(lung_names) = 1:length(lung_names)
clust_ord = as.integer(names(sort(factor(lung_names, levels = lin_ord))))
cell_ord = names(mc@mc[ order(factor(mc@mc, levels = clust_ord))])
IM = Y[cell_ord, cell_ord]
coc_shades = colorRampPalette(c("white", "gold", "orange2", "tomato3", "red4", "purple4", "black"))(1000)
IM = pmax(IM, t(IM))
if(!dir.exists("results/figureS1")) dir.create("results/figureS1/")
png("results/figureS1/coclustering.png", height = 2000, width = 2200)
par(mar = rep(0.5,4), fig = c(0.05,0.85,0.05,1),xpd=TRUE)
image(log(1 + IM), col = coc_shades, axes = F)
cls = cumsum(table(factor( mc@mc, levels = clust_ord))) / length(mc@mc)
cls_wide = cumsum(table(factor( color2name[mc@colors[mc@mc]], levels = lin_ord))) / length(mc@mc)
abline(h = cls, v = cls, lty = 2, lwd = 1, col = "gray20")
abline(h = cls_wide, v = cls_wide, lty = 1, lwd = 2, col = "black")
par(mar = rep(0.5,4), fig = c(0,0.05,0.05,1), new = T)
image(t(matrix(as.numeric(factor(lung_names[as.integer(mc@mc[colnames(IM)])], levels = lin_ord)))), axes = F, 
    col = name2color[lin_ord], zlim = c(1, length(lin_ord)))
par(mar = rep(0.5,4), fig = c(0.05,0.85,0,0.05), new = T)
image(matrix(as.numeric(factor(lung_names[as.integer(mc@mc[colnames(IM)])], levels = lin_ord))), axes = F, 
    col = name2color[lin_ord], zlim = c(1, length(lin_ord)))
par(mar = c(1,3,1,10), fig = c(0.85,1,0.05,0.3),new = T)
image(t(matrix(1:100)), axes = F, col = coc_shades)
mtext(as.integer(min(log(1 + IM))),side=1,at = 0.25,las = 1,line=1, cex=mcp_heatmap_text_cex+2)
mtext(as.integer(max(log(1 + IM))),side=3,at = 0.25,las = 1,line=1, cex=mcp_heatmap_text_cex+2)
mtext("log co-clustering",side = 4, at = 0.5, cex=mcp_heatmap_text_cex + 2, line = 3)
par(mar = rep(2,4), fig = c(0.85,1,0.3,1),new = T)
legend("topleft", legend = as.character(color2name), pch = 22, cex = mcp_2d_legend_cex + 2,pt.cex = mcp_2d_legend_cex + 4, pt.bg = names(color2name),col="black", bty = "n")
dev.off()
#> png 
#>   2
Coclustering

Coclustering